Search CORE

15 research outputs found

Implementing Multithreaded Protocols for Release Consistency on Top of the Generic DSM-PM2 Platform

Author: B. Nitzberg
J. B. Carter
K. Li
R. Namyst
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2002
Field of study

10.1007/3-540-47840-X_18DSM-PM2 is an implementation platform designed to facilitate the experimental studies with consistency protocoles for distributed shared memory. This platform provides basic building blocks, allowing for an easy design, implementation and evaluation of a large variety of multithreaded consistency protocols within a unified framework. DSM-PM2 is portable over a large variety of cluster architectures, using various communication interfaces (TCP, MPI, BIP, SCI, VIA, etc.). This paper presents the design of two multithreaded protocols implementing the release consistency model. We evaluate the impact of these consistency protocols on the overall performance of a typical distributed application, for two clusters with different interconnection networks and communication interfaces

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Towards an efficient process placement policy for MPI applications in multicore environments

Author: E. Gabriel
F. Pellegrini
J.L. Träff
R. Bolze
R. Namyst
S. Thibault
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

International audienceThis paper presents a method to efficiently place MPI processes on multicore machines. Since MPI implementations often feature efficient supports for both shared-memory and network communication, an adequate placement policy is a crucial step to improve applications performance. As a case study, we show the results obtained for several NAS computing kernels and explain how the policy influences overall performance. In particular, we found out that a policy merely increasing the intranode communication ratio is not enough and that cache utilization is also an influential factor. A more sophisticated policy (eg. one taking into account the architecture's memory structure) is required to observe performance improvements

Crossref

INRIA a CCSD electronic archive server

Towards exascale with the ANR-JST japanese-french project FP3C

Author: C Calvin
G Antoniu
H Nakashima
K Nakajima
M Daydé
M Sato
N Emad
P Codognet
R Namyst
S Matsuoka
S Petiton
T Boku
T Sakurai
Y Ishikawa
Publication venue
Publication date: 24/04/2020
Field of study

CiteSeerX

An efficient and transparent thread migration scheme in the PM2 runtime system

Author: A. Itzkovitz
E. Mascarenhas
L. Bougé
R. Namyst
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A runtime approach to dynamic resource allocation for sparse direct solvers

Author: Guermouche A
Hugo A.-E
Namyst R
Wacrenier P.-A
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/09/2014
Field of study

International audience—To face the advent of multicore processors and the ever increasing complexity of hardware architectures, pro-gramming models based on DAG-of-tasks parallelism regained popularity in the high performance, scientific computing com-munity. In this context, enabling HPC applications to perform efficiently when dealing with graphs of parallel tasks that could potentially run simultaneously is a great challenge. Even if a uniform runtime system is used underneath, scheduling multiple parallel tasks over the same set of hardware resources introduces many issues, such as undesirable cache flushes or memory bus contention. In this paper, we show how runtime system-based scheduling contexts can be used to dynamically enforce locality of parallel tasks on multicore machines. We extend an existing generic sparse direct solver to use our mechanism and introduce a new decomposition method based on proportional mapping that is used to build the scheduling contexts. We propose a runtime-level dynamic context management policy to cope with the very irregular behavior of the application. A detailed performance analysis shows significant performance improvements of the solver over various multicore hardware

Crossref

INRIA a CCSD electronic archive server

Author manuscript, published in "Third International Workshop on Accelerators and Hybrid Exascale Systems (2013)" 1 Composing multiple StarPU applications over heterogeneous machines: a supervised approach

Author: A. -e Hugo
A. Guermouche
P. -a. Wacrenier
R. Namyst
Publication venue
Publication date: 20/05/2013
Field of study

Abstract—Enabling HPC applications to perform efficiently when invoking multiple parallel libraries simultaneously is a great challenge. Even if a single runtime system is used underneath, scheduling tasks or threads coming from different libraries over the same set of hardware resources introduces many issues, such as resource oversubscription, undesirable cache flushes or memory bus contention. This paper presents an extension of StarPU, a runtime system specifically designed for heterogeneous architectures, that allows multiple parallel codes to run concurrently with minimal interference. Such parallel codes run within scheduling contexts that provide confined execution environments which can be used to partition computing resources. Scheduling contexts can be dynamically resized to optimize the allocation of computing resources among concurrently running libraries. We introduce a hypervisor that automatically expands or shrinks contexts using feedback from the runtime system (e.g. resource utilization). We demonstrate the relevance of our approach using benchmarks invoking multiple high performance linear algebra kernels simultaneously on top of heterogeneous multicore machines. We show that our mechanism can dramatically improve the overall application run time (-34%), most notably by reducing the average cache miss ratio (-50%). I

CiteSeerX

INRIA a CCSD electronic archive server

Enabling Java for high-performance computing

Author: Carpenter B.
Henri E. Bal
Luc Bougé
Namyst R.
Philip Hatcher
Philippsen M.
Thilo Kielmann
Yu W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

The PEPPHER approach to programmability and performance portability for heterogeneous many-core architectures

Author: Bachmayer B.
Benkner S.
Kessler C.
Larsson Tr\ue4ff J.
Moloney D.
Namyst R.
Pllana S.
Richards A.
Sanders P.
Tsigas Philippas
Publication venue: 'IOS Press'
Publication date: 01/01/2012
Field of study

The European FP7 project PEPPHER is addressing programmability and performance portability for current and emerging heterogeneous many-core architectures. As its main idea, the project proposes a multi-level parallel execution model comprised of potentially parallelized components existing in variants suitable for different types of cores, memory configurations, input characteristics, optimization criteria, and couples this with dynamic and static resource and architecture aware scheduling mechanisms. Crucial to PEPPHER is that components can be made performance aware, allowing for more efficient dynamic and static scheduling on the concrete, available resources. The flexibility provided in the software model, combined with a customizable, heterogeneous, memory and topology aware run-time system is key to efficiently exploiting the resources of each concrete hardware configuration. The project takes a holistic approach, relying on existing paradigms, interfaces, and languages for the parallelization of components, and develops a prototype framework, a methodology for extending the framework, and guidelines for constructing performance portable software and systems-including paths to migration of existing software-for heterogeneous many-core processors. This paper gives a high-level project overview, and presents a specific example showing how the PEPPHER component variant model and resource-aware run-time system enable performance portability of a numerical kernel. \ua9 2012 The authors and IOS Press. All rights reserved

Chalmers Research

Anwendung der MRC Guidance in der allgemeinmedizinischen Forschung: Ergebnisse aus der PRIMUM-Studie (PRIorisierung von MUltimedikation bei Multimorbidität; BMBF-Fkz: 01GK0702)

Author: Beyer M
Fullerton B
Gerlach FM
Harder S
Muth C
Namyst A
Perera-Salazar R
Rochon J
van den Akker M
Publication venue: German Medical Science GMS Publishing House; Düsseldorf
Publication date: 05/03/2012
Field of study

German Medical Science

Peppher: Performance Portability and Programmability for Heterogeneous Many-Core Architectures

Author: Augonnet C.
Benkner S.
Cornelius H.
Keler C.
Moloney D.
Namyst R.
Pllana S.
Richards A.
Russell G.
Sanders P.
Thibault S.
Tr\ue4ff J.L.
Tsigas Philippas
Publication venue: 'Wiley'
Publication date: 01/01/2017
Field of study

\ua9 2017 by John Wiley & Sons, Inc. All rights reserved. PEPPHER takes a pluralistic and parallelization agnostic approach to programmability and performance portability for heterogeneous many-core architectures. The PEPPHER framework is in principle language independent but focuses on supporting C++ code with PEPPHER-specific annotations as pragmas or external annotations. The framework is open and extensible; the PEPPHER methodology details how new architectures are incorporated. The PEPPHER methodology consists of rules for how to extend the framework for new architectures. This mainly concerns adaptivity and autotuning for algorithm libraries, the necessary hooks and extensions for the run-time system and any supporting algorithms and data structures that this relies on. Offloading is a specific technique for programming heterogeneous platforms that can sometimes be applied with high efficiency. Offload as developed by the PEPPHER partner Codeplay is a particular, nonintrusive C++ extension allowing portable C++ code to support diverse heterogeneous multicore architectures in a single code base

Chalmers Research